-
Notifications
You must be signed in to change notification settings - Fork 675
chore: mm epd disagg #4151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
chore: mm epd disagg #4151
Conversation
Signed-off-by: ayushag <[email protected]>
Signed-off-by: ayushag <[email protected]>
krishung5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm! Please remove all the debug logs and the old disagg_multimodal.sh file in the example/multimodal folder before merging.
Signed-off-by: ayushag <[email protected]>
WalkthroughAdded multimodal decode worker support to the vLLM configuration system, updated handler initialization logic to route between decode and prefill+worker handlers based on configuration, introduced a new disaggregated multimodal serving script, and removed an obsolete orchestration script. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Key areas requiring attention:
Poem
Pre-merge checks❌ Failed checks (2 warnings, 1 inconclusive)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
examples/backends/vllm/launch/disagg_multimodal.sh (2)
73-104: Consider adding health checks between component launches.The script launches all components in rapid succession without waiting for dependencies to be ready. For example, the encode worker (line 89) may start before the processor (line 79) is ready to receive connections.
Consider one of these approaches:
Option 1: Add sleep delays between launches
# Start processor echo "Starting processor..." python -m dynamo.vllm --multimodal-processor --model $MODEL_NAME --mm-prompt-template "$PROMPT_TEMPLATE" & +sleep 5 # Configure GPU memory optimization for specific modelsOption 2: Add health check polling (more robust)
After each component launch, add a function to poll its health endpoint:
wait_for_service() { local port=$1 local max_attempts=30 for i in $(seq 1 $max_attempts); do if curl -s "http://localhost:$port/health" > /dev/null 2>&1; then return 0 fi sleep 1 done echo "Service on port $port failed to start" exit 1 }
87-97: Document or validate GPU availability.The script assumes GPUs 1, 2, and 3 are available but doesn't validate this. Consider adding a GPU count check at the start or documenting the minimum GPU requirement.
Add GPU validation:
# After line 71, before starting components GPU_COUNT=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l) if [ $GPU_COUNT -lt 4 ]; then echo "Error: This script requires at least 4 GPUs, but only $GPU_COUNT found" exit 1 fiOr document the requirement in the help text:
echo "Disaggregated multimodal serving with separate Encode/Prefill/Decode workers" echo "" +echo "Requirements: At least 4 NVIDIA GPUs" +echo "" echo "Options:"
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
components/src/dynamo/vllm/args.py(5 hunks)components/src/dynamo/vllm/main.py(3 hunks)examples/backends/vllm/launch/disagg_multimodal.sh(1 hunks)examples/multimodal/launch/disagg.sh(0 hunks)
💤 Files with no reviewable changes (1)
- examples/multimodal/launch/disagg.sh
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-28T04:09:48.264Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.
Applied to files:
components/src/dynamo/vllm/main.py
📚 Learning: 2025-06-05T01:04:24.775Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The `create_endpoint` method in `WorkerMetricsPublisher` has backward compatibility maintained through pyo3 signature annotation `#[pyo3(signature = (component, dp_rank = None))]`, making the `dp_rank` parameter optional with a default value of `None`.
Applied to files:
components/src/dynamo/vllm/main.py
🧬 Code graph analysis (1)
components/src/dynamo/vllm/main.py (3)
components/src/dynamo/vllm/multimodal_handlers/worker_handler.py (2)
MultimodalDecodeWorkerHandler(25-81)MultimodalPDWorkerHandler(84-260)lib/bindings/python/src/dynamo/_core.pyi (5)
namespace(42-46)component(88-92)endpoint(117-121)client(154-158)wait_for_instances(193-200)lib/bindings/python/rust/lib.rs (5)
namespace(491-496)component(815-821)endpoint(703-709)client(785-799)wait_for_instances(840-849)
🔇 Additional comments (9)
components/src/dynamo/vllm/args.py (3)
72-72: LGTM! Consistent flag implementation.The
multimodal_decode_workerflag is properly integrated across class attributes, CLI arguments, exclusivity checks, error messaging, and config propagation.Also applies to: 174-178, 227-227, 232-232, 274-274
245-249: LGTM! Clear component routing for decode worker.The decode worker correctly uses "decoder" as the component name to enable prefill worker connections in disaggregated mode.
250-253: LGTM! Critical routing for multimodal prefill worker.The multimodal prefill worker correctly uses "backend" as the component name to maintain the encoder→backend connection, which differs from the standard prefill worker component naming. The comment clearly explains this design decision.
examples/backends/vllm/launch/disagg_multimodal.sh (3)
1-6: LGTM! Proper script initialization.Good use of
set -efor early exit on errors andtrapfor cleanup of background processes.
12-47: LGTM! Clear CLI interface.The command-line argument parsing is well-structured with helpful usage information and examples.
49-63: LGTM! Sensible template defaults.Model-specific prompt templates are properly defined with a clear fallback mechanism for unsupported models.
components/src/dynamo/vllm/main.py (3)
32-32: LGTM! Clean import addition.
109-113: LGTM! Correct routing for decode worker.The condition appropriately routes
multimodal_decode_workertoinit_multimodal_worker, where handler selection occurs based on the specific worker type.
639-660: LGTM! Correct handler selection and client wiring.The logic correctly distinguishes between:
- Decode worker (
multimodal_decode_worker=True): UsesMultimodalDecodeWorkerHandlerwithout needing a downstream client- Prefill worker (
is_prefill_worker=True): Createsdecode_worker_clientand passes it toMultimodalPDWorkerHandlerfor disaggregated modeThe handler signatures match the constructor definitions, and
wait_for_instances()is properly called before use (line 649).
Overview:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Release Notes
New Features
Chores